Skip to content

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Oct 6, 2025

Currently llvm::dwarf::LanguageDescription returns a stringified DW_LNAME. It would be useful to have an API that returns the language name for a particular DW_LNAME_/version pair. LLDB's use case is that it wants to emit diagnostics with human readable descriptions of the language we got from debug-info (see #161688). We could maintain a side-table in LLDB but thought this might be generally useful and should live next to the existing LanguageDescription API.

@llvmbot
Copy link
Member

llvmbot commented Oct 6, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-debuginfo

Author: Michael Buch (Michael137)

Changes

Currently llvm::dwarf::LanguageDescription returns a stringified DW_LNAME. It would be useful to have an API that returns the language name for a particular DW_LNAME_/version pair. LLDB's use case is that it wants to emit diagnostics with human readable descriptions of the language we got from debug-info. We could maintain a side-table in LLDB but thought this might be generally useful and should live next to the existing LanguageDescription API.


Full diff: https://github.com/llvm/llvm-project/pull/162048.diff

2 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.h (+5)
  • (modified) llvm/lib/BinaryFormat/Dwarf.cpp (+111)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 2c5012510a5c3..de29f14097fb8 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -500,8 +500,13 @@ toDW_LNAME(SourceLanguage language) {
   return {};
 }
 
+/// Returns a version-independent language name.
 LLVM_ABI llvm::StringRef LanguageDescription(SourceLanguageName name);
 
+/// Returns a language name corresponding to the specified version.
+LLVM_ABI llvm::StringRef LanguageDescription(SourceLanguageName Name,
+                                             uint32_t Version);
+
 inline bool isCPlusPlus(SourceLanguage S) {
   bool result = false;
   // Deliberately enumerate all the language options so we get a warning when
diff --git a/llvm/lib/BinaryFormat/Dwarf.cpp b/llvm/lib/BinaryFormat/Dwarf.cpp
index 8b24044e19e50..8e87c68424e00 100644
--- a/llvm/lib/BinaryFormat/Dwarf.cpp
+++ b/llvm/lib/BinaryFormat/Dwarf.cpp
@@ -472,6 +472,117 @@ StringRef llvm::dwarf::LanguageDescription(dwarf::SourceLanguageName lname) {
   return "Unknown";
 }
 
+StringRef llvm::dwarf::LanguageDescription(dwarf::SourceLanguageName Name,
+                                           uint32_t Version) {
+  switch (Name) {
+  // YYYY
+  case DW_LNAME_Ada: {
+    if (Version <= 1983)
+      return "Ada 83";
+    if (Version <= 1995)
+      return "Ada 95";
+    if (Version <= 2005)
+      return "Ada 2005";
+    if (Version <= 2012)
+      return "Ada 2012";
+  } break;
+
+  case DW_LNAME_Cobol: {
+    if (Version <= 1974)
+      return "COBOL-74";
+    if (Version <= 1985)
+      return "COBOL-85";
+  } break;
+
+  case DW_LNAME_Fortran: {
+    if (Version <= 1977)
+      return "FORTRAN 77";
+    if (Version <= 1990)
+      return "FORTRAN 90";
+    if (Version <= 1995)
+      return "Fortran 95";
+    if (Version <= 2003)
+      return "Fortran 2003";
+    if (Version <= 2008)
+      return "Fortran 2008";
+    if (Version <= 2018)
+      return "Fortran 2018";
+  } break;
+
+  // YYYYMM
+  case DW_LNAME_C: {
+    if (Version == 0)
+      return "K&R C";
+    if (Version <= 198912)
+      return "C89";
+    if (Version <= 199901)
+      return "C99";
+    if (Version <= 201112)
+      return "C11";
+    if (Version <= 201710)
+      return "C17";
+  } break;
+
+  case DW_LNAME_C_plus_plus: {
+    if (Version == 0)
+      break;
+    if (Version <= 199711)
+      return "C++98";
+    if (Version <= 200310)
+      return "C++03";
+    if (Version <= 201103)
+      return "C++11";
+    if (Version <= 201402)
+      return "C++14";
+    if (Version <= 201703)
+      return "C++17";
+    if (Version <= 202002)
+      return "C++20";
+  } break;
+
+  case DW_LNAME_ObjC_plus_plus:
+  case DW_LNAME_ObjC:
+  case DW_LNAME_Move:
+  case DW_LNAME_SYCL:
+  case DW_LNAME_BLISS:
+  case DW_LNAME_Crystal:
+  case DW_LNAME_D:
+  case DW_LNAME_Dylan:
+  case DW_LNAME_Go:
+  case DW_LNAME_Haskell:
+  case DW_LNAME_HLSL:
+  case DW_LNAME_Java:
+  case DW_LNAME_Julia:
+  case DW_LNAME_Kotlin:
+  case DW_LNAME_Modula2:
+  case DW_LNAME_Modula3:
+  case DW_LNAME_OCaml:
+  case DW_LNAME_OpenCL_C:
+  case DW_LNAME_Pascal:
+  case DW_LNAME_PLI:
+  case DW_LNAME_Python:
+  case DW_LNAME_RenderScript:
+  case DW_LNAME_Rust:
+  case DW_LNAME_Swift:
+  case DW_LNAME_UPC:
+  case DW_LNAME_Zig:
+  case DW_LNAME_Assembly:
+  case DW_LNAME_C_sharp:
+  case DW_LNAME_Mojo:
+  case DW_LNAME_GLSL:
+  case DW_LNAME_GLSL_ES:
+  case DW_LNAME_OpenCL_CPP:
+  case DW_LNAME_CPP_for_OpenCL:
+  case DW_LNAME_Ruby:
+  case DW_LNAME_Hylo:
+  case DW_LNAME_Metal:
+    break;
+  }
+
+  // Fallback to un-versioned name.
+  return LanguageDescription(Name);
+}
+
 StringRef llvm::dwarf::CaseString(unsigned Case) {
   switch (Case) {
   case DW_ID_case_sensitive:

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be good to have some test coverage - either as a unit test, or if there's some useful way to render this data in llvm-dwarfdump, even better.

Comment on lines +480 to +487
if (Version <= 1983)
return "Ada 83";
if (Version <= 1995)
return "Ada 95";
if (Version <= 2005)
return "Ada 2005";
if (Version <= 2012)
return "Ada 2012";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have expected these to work the other way around - like if the Version is < 1995, then it's Ada 83.

Might be weird to say it's Ada95 when the version is 1990/before 1995? But I'm not sure if there's a clear general sense, or language-specific sense, of what the intermediate values should mean.

Copy link
Member Author

@Michael137 Michael137 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea those intermediate values haven't really been specified i think. I took inspiration from the logic in:

/// Convert a DWARF 6 pair of language name and version to a DWARF 5 DW_LANG.
/// If the version number doesn't exactly match a known version it is
/// rounded up to the next-highest known version number.
inline std::optional<SourceLanguage> toDW_LANG(SourceLanguageName name,
uint32_t version) {
switch (name) {
case DW_LNAME_Ada: // YYYY
if (version <= 1983)
return DW_LANG_Ada83;
if (version <= 1995)
return DW_LANG_Ada95;
if (version <= 2005)
return DW_LANG_Ada2005;
if (version <= 2012)
return DW_LANG_Ada2012;
return {};
case DW_LNAME_BLISS:
return DW_LANG_BLISS;
case DW_LNAME_C: // YYYYMM, K&R 000000
if (version == 0)
return DW_LANG_C;
if (version <= 198912)
return DW_LANG_C89;
if (version <= 199901)
return DW_LANG_C99;
if (version <= 201112)
return DW_LANG_C11;
if (version <= 201710)
return DW_LANG_C17;
return {};
case DW_LNAME_C_plus_plus: // YYYYMM
if (version == 0)
return DW_LANG_C_plus_plus;
if (version <= 199711)
return DW_LANG_C_plus_plus;
if (version <= 200310)
return DW_LANG_C_plus_plus_03;
if (version <= 201103)
return DW_LANG_C_plus_plus_11;
if (version <= 201402)
return DW_LANG_C_plus_plus_14;
if (version <= 201703)
return DW_LANG_C_plus_plus_17;
if (version <= 202002)
return DW_LANG_C_plus_plus_20;
return {};

There we round up the intermediate version number. Reading https://dwarfstd.org/languages-v6.html, it sounds like the version number specifies the completion of a particular language version. Anything after that is the next version. At least that's how I think about this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable way to deal with pre-release compilers that guess the final date of the standard incorrectly. The other alternative would be put the raw version in the string.

@Michael137
Copy link
Member Author

Be good to have some test coverage - either as a unit test, or if there's some useful way to render this data in llvm-dwarfdump, even better.

Added unittests in latest commit. Testing using dwarfdump wouldn't work yet because we don't emit version names yet in LLVM (LLDB synthesizes them from the DW_LANG_ codes). Once we implement the DWARFv6 language version scheme (which I'm currently doing but don't have anything to review yet), we could test this further.

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have super strong opinions about the rounding, but it seems reasonable to me.

});
}

TEST(DWARFDebugInfo, TestLanguageDescription_Versioned) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably some kind of table-based testing approach that might be tidier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm agreed it would be nice to add some more coverage. Not sure off the top how we'd do this with a table though, without duplicating the language strings (which would kind of defeat the purpose of the test).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow.

What I was picturing was a gunit TEST_P (see, for example, DumpValueFixture in llvm/unittests/DebugInfo/DWARF/DWARFFormValueTest.cpp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean. Yea we can make it a parameterized test for sure

Will do in a follow-up!

Currently `llvm::dwarf::LanguageDescription` returns a stringified
`DW_LNAME`. It would be useful to have an API that returns the language
name for a particular `DW_LNAME_`/version pair. LLDB's use case is that
it wants to display a human readable description of the language we got
from debug-info in diagnostics. We could maintain a side-table in LLDB
but though this might generally be useful to live next to the
`LanguageDescription` API.
@Michael137 Michael137 force-pushed the llvm/dwarf-versioned-language-name branch from f322710 to 924d111 Compare October 8, 2025 16:55
@Michael137 Michael137 enabled auto-merge (squash) October 8, 2025 17:26
@Michael137 Michael137 merged commit 030d8e6 into llvm:main Oct 8, 2025
7 of 8 checks passed
@Michael137 Michael137 deleted the llvm/dwarf-versioned-language-name branch October 8, 2025 17:34
Michael137 added a commit that referenced this pull request Oct 8, 2025
…sion (#162050)

Depends on #162048

This makes sure we also include the version number in the description.

For `C++17`, this would, e.g., now return `"C++17"` instead of `"ISO
C++"`.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
…anguage version (#162050)

Depends on llvm/llvm-project#162048

This makes sure we also include the version number in the description.

For `C++17`, this would, e.g., now return `"C++17"` instead of `"ISO
C++"`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants